Variable selection for multiply-imputed data with application to dioxin exposure study.

نویسندگان

  • Qixuan Chen
  • Sijian Wang
چکیده

Multiple imputation (MI) is a commonly used technique for handling missing data in large-scale medical and public health studies. However, variable selection on multiply-imputed data remains an important and longstanding statistical problem. If a variable selection method is applied to each imputed dataset separately, it may select different variables for different imputed datasets, which makes it difficult to interpret the final model or draw scientific conclusions. In this paper, we propose a novel multiple imputation-least absolute shrinkage and selection operator (MI-LASSO) variable selection method as an extension of the least absolute shrinkage and selection operator (LASSO) method to multiply-imputed data. The MI-LASSO method treats the estimated regression coefficients of the same variable across all imputed datasets as a group and applies the group LASSO penalty to yield a consistent variable selection across multiple-imputed datasets. We use a simulation study to demonstrate the advantage of the MI-LASSO method compared with the alternatives. We also apply the MI-LASSO method to the University of Michigan Dioxin Exposure Study to identify important circumstances and exposure factors that are associated with human serum dioxin concentration in Midland, Michigan.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Selection of Variables that Influence Drug Injection in Prison: Comparison of Methods with Multiple Imputed Data Sets

Background: Prisoners, compared to the general population, are at greater risk of infection. Drug injection is the main route of HIV transmission, in particular in Iran. What would be of interest is to determine variables that govern drug injection among prisoners. However, one of the issues that challenge model building is incomplete national data sets. In this paper, we addressed the process ...

متن کامل

Order selection tests with multiply imputed data

Nonparametric tests for the null hypothesis that a function has a prescribed form are developed and applied to data sets with missing observations. Omnibus nonparametric tests such as the order selection tests, do not need to specify a particular alternative parametric form, and have power against a large range of alternatives. More specifically, likelihood-based order selection tests are defin...

متن کامل

Association between Environmental Dioxin-Related Toxicants Exposure and Adverse Pregnancy Outcome: Systematic Review and Meta-Analysis.

Dioxin-related compounds are associated with teratogenic and mutagenic risks in laboratory animals, and result in adverse pregnancy outcomes. However, there were inconsistent results in epidemiology studies. In view of this difference, we conducted a systematic review and meta-analysis to examine this association and to assess the heterogeneity among studies. Comprehensive literature searches w...

متن کامل

Multiple imputation and other resampling schemes for imputing missing observations

The problem of imputing missing observations under the linear regression model is considered. It is assumed that observations are missing at random and all the observations on the auxiliary or independent variables are available. Estimates of the regression parameters based on singly and multiply imputed values are given. Jackknife as well as bootstrap estimates of the variance of the singly im...

متن کامل

Development and validation of a prediction model with missing predictor data: a practical approach.

OBJECTIVE To illustrate the sequence of steps needed to develop and validate a clinical prediction model, when missing predictor values have been multiply imputed. STUDY DESIGN AND SETTING We used data from consecutive primary care patients suspected of deep venous thrombosis (DVT) to develop and validate a diagnostic model for the presence of DVT. Missing values were imputed 10 times with th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Statistics in medicine

دوره 32 21  شماره 

صفحات  -

تاریخ انتشار 2013